Unicode block

In Unicode, a block is defined as one contiguous range of code points. Blocks are named uniquely and have no overlap. They may be defined with the starting and ending code points. The block explicitly can include code points that are unassigned and non-characters.[1] Code points not belonging to any of the named blocks, e.g. in the unassigned planes 3–13, have the value block="No_block".

Conversely, every assigned code point has a property "Block name", which names in which block the character is. This is determined by the code point only, although a block name will have a descriptive nature: "Tibetan" or "Supplemental Arrows-A". All assigned code points have a single block name.

Subdivisions, such as "Chess symbols" in the block Miscellaneous symbols, are not a "block". The subgroup name is an informative editorial addition only.

Unicode blocks and contained scripts

Notes

  1. ^ a b c Unicode Blocks data file. As of Unicode version 6.0
  2. ^ a b c UAX 24: Unicode Script Property (4alpha code)
  3. ^ a b c UAX 24: Script data file
  4. ^ a b c Including unassigned code points: non-character, reserved
  5. ^ a b c The script has one or multiple characters in the block, as defined by the Script Property. This is independent of the block name
  6. ^ a b c "Common" (Zyyy) and "Inherited" (Qaai or Zinh) refer to Scripts in ISO 15924

See also

References

  1. ^ Unicode glossary